Towards Structural Logistic Regression: Combining Relational and Statistical Learning
نویسندگان
چکیده
Inductive logic programming (ILP) techniques are useful for analyzing data in multi-table relational databases. Learned rules can potentially discover relationships that are not obvious in "flattened" data. Statistical learners, on the other hand, are generally not constructed to search relational data; they expect to be presented with a single table containing a set of feature candidates. However, statistical learners often yield more accurate models than the logical forms of ILP, and can better handle certain types of data, such as counts. We propose a new approach which integrates structure navigation from ILP with regression modeling. Our approach propositionalizes the first-order rules at each step of ILP's relational structure search, generating features for potential inclusion in a regression model. Ideally, feature generation by ILP and feature selection by stepwise regression should be integrated into a single loop. Preliminary results for scientific literature classification are presented using a relational form of the data extracted by ResearchIndex (formerly CiteSeer). We use FOIL and logistic regression as our ILP and statistical components (decoupled at this stage). Word counts and citation-based features learned with FOIL are modeled together by logistic regression. The combination often significantly improves performance when high precision classification is desired. Comments Presented at the 1st Workshop on Multi-Relational Data Mining (MRDM 2002). This conference paper is available at ScholarlyCommons: http://repository.upenn.edu/cis_papers/134 In Multi-Relational Data Mining Workshop at KDD-2002. ! " # %$'&( & ) ! *,+
منابع مشابه
Bridging Weighted Rules and Graph Random Walks for Statistical Relational Models
The aim of statistical relational learning is to learn statistical models from relational or graph-structured data. Three main statistical relational learning paradigms include weighted rule learning, random walks on graphs, and tensor factorization. These paradigms have been mostly developed and studied in isolation for many years, with few works attempting at understanding the relationship am...
متن کاملRelNN: A Deep Neural Model for Relational Learning
Statistical relational AI (StarAI) aims at reasoning and learning in noisy domains described in terms of objects and relationships by combining probability with first-order logic. With huge advances in deep learning in the current years, combining deep networks with first-order logic has been the focus of several recent studies. Many of the existing attempts, however, only focus on relations an...
متن کاملStructural Logistic Regression for Link Analysis
We present Structural Logistic Regression, an extension of logistic regression to modeling relational data. It is an integrated approach to building regression models from data stored in relational databases in which potential predictors, both boolean and real-valued, are generated by structured search in the space of queries to the database, and then tested with statistical information criteri...
متن کاملDetermination of Financial Failure Indicators by Gray Relational Analysis and Application of Data Envelopment Analysis and Logistic Regression Analysis in BIST 100 Index
Financial failure prediction models have been developed by using Logistic Regression (LR) analysis from traditional statistical methods and Data Envelopment Analysis (DEA), which is a mathematically based nonparametric method over the financial reports of the companies traded in The Istanbul Stock Exchange National 100 Index (BIST 100) between the years 2014-2016. In the development of these mo...
متن کاملStochastic Gradient Descent for Relational Logistic Regression via Partial Network Crawls
Research in statistical relational learning has produced a number of methods for learning relational models from largescale network data. While these methods have been successfully applied in various domains, they have been developed under the unrealistic assumption of full data access. In practice, however, the data are often collected by crawling the network, due to proprietary access, limite...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016